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Abstract 

Mining discriminative features for graph data has at- 
tracted much attention in recent years due to its im- 
portant role in constructing graph classifiers, generat- 
ing graph indices, etc. Most measurement of interest- 
ingness of discriminative subgraph features are defined 
on certain graphs, where the structure of graph objects 
are certain, and the binary edges within each graph 
represent the "presence" of linkages among the nodes. 
In many real- world applications, however, the linkage 
structure of the graphs is inherently uncertain. There- 
fore, existing measurements of interestingness based 
upon certain graphs are unable to capture the struc- 
tural uncertainty in these applications effectively. In 
this paper, we study the problem of discriminative sub- 
graph feature selection from uncertain graphs. This 
problem is challenging and different from conventional 
subgraph mining problems because both the structure 
of the graph objects and the discrimination score of each 
subgraph feature are uncertain. To address these chal- 
lenges, we propose a novel discriminative subgraph fea- 
ture selection method, Dug, which can find discrim- 
inative subgraph features in uncertain graphs based 
upon different statistical measures including expecta- 
tion, median, mode and (/j-probability. We first compute 
the probability distribution of the discrimination scores 
for each subgraph feature based on dynamic program- 
ming. Then a branch-and-bound algorithm is proposed 
to search for discriminative subgraphs efficiently. Ex- 
tensive experiments on various neuroimaging applica- 
tions (i.e., Alzheimers Disease, ADHD and HIV) have 
been performed to analyze the gain in performance by 
taking into account structural uncertainties in identify- 
ing discriminative subgraph features for graph classifi- 
cation. 
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1 Introduction 

Graphs arise naturally in many scientific applications 
which involve complex structures in the data, e.g., 
chemical compounds, program flows, etc. Different from 
traditional data with flat features, these data are usu- 
ally not directly represented as feature vectors, but 
as graphs with nodes and edges. Mining discrimina- 
tive features for graph data has attracted much atten- 
tion in recent years due to its important role in con- 
structing graph classifiers, generating graph indices, etc. 
[221 [HQ |U HH [20] . Much of the past research in discrim- 
inative subgraph feature mining has focused on certain 
graphs, where the structure of the graph objects are 
certain, and the binary edges represent the "presence" 
of linkages between the nodes. Conventional subgraph 
mining methods [22] utilize the structures of the certain 
graphs to find discriminative subgraph features. How- 
ever, in many real-world applications, there is inherent 
uncertainty about the graph linkage structure. Such 
uncertainty information will be lost if we directly trans- 
form uncertain graphs into certain graphs. 

For example, in neuroimaging, the functional con- 
nectivities among different brain regions are highly un- 
certain [6] [U [3 [25]. In such applications, each human 
brain can be represented as an uncertain graph as shown 
in Figure [1] which is also called the "brain network" 
[2]. In such brain networks, the nodes represent brain 
regions, and edges represent the probabilistic connec- 
tions, e.g., resting-state functional connectivity in fMRI 
(functional Magnetic Resonance Imaging). Since these 
functional connectivities are derived based upon pro- 
cessing steps, such as temporal correlations in sponta- 
neous blood oxygen level-dependent (BOLD) signal os- 
cillations, each edge of the brain network is associated 
with a probability to quantify the likelihood that the 
functional connection exists in the brain. Resting-state 
functional connectivity has shown alterations related 
to many neurological diseases, such as ADHD (Atten- 
tion Deficit Hyperactivity Disorder), Alzheimer's dis- 
ease and virus infections that may affect the brain func- 
tioning, such as HIV [21]. Researchers are interested 
in analyzing the complex structure and uncertain con- 
nectivities of the human brain to find biomarkers for 
neurological diseases. Such biomarkers are clinically im- 




(a) positive uncertain graph (b) negative uncertain graph 



Figure 1: An example of uncertain graph classification 
task. 

perative for detecting injury to the brain in the earliest 
stages before it is irreversible. Valid biomarkers can be 
used to aid diagnosis, monitor disease progression and 
evaluate effects of intervention. 

Motivated by these real-world neuroimaging appli- 
cations, in this paper, we study the problem of min- 
ing discriminative subgraph features in uncertain graph 
datasets. Discriminative subgraph features are funda- 
mental for uncertain graphs, just as they are for cer- 
tain graphs. They serve as primitive features for the 
classification tasks on uncertain graph objects. Despite 
the value and significance, the discriminative subgraph 
mining for uncertain graph classification has not been 
studied in this context. If we consider discriminative 
subgraph mining and uncertain graph structures as a 
whole, the major research challenges are as follows: 
Structural Uncertainty: In discriminative subgraph 
mining, we need to estimate the discrimination score 
of a subgraph feature in order to select a set of sub- 
graphs that are most discriminative for a classification 
task. In conventional subgraph mining, the discrimina- 
tion scores of subgraph features are defined on certain 
graphs, where the structure of each graph object is cer- 
tain, and thus the containment relationships between 
subgraph features and graph objects are also certain. 
However, when uncertainty is presented in the struc- 
tures of graphs, a subgraph feature only exists within 
a graph object with a probability. Thus the discrimi- 
nation scores of a subgraph feature are no longer deter- 
ministic values, but random variables with probability 
distributions. 

Thus, the evaluation of discrimination scores for 
subgraph features in uncertain graphs is different from 
conventional subgraph mining problems. For example, 
in Figure [2j we show an uncertain graph dataset con- 
taining 4 uncertain graphs G\ , • • • , G4 with their class 
labels, + or — . Subgraph g\ is a frequent pattern among 
the uncertain graphs, but it may not relate to the class 
labels of the graphs. Subgraph (72 is a discriminative 
subgraph features when we ignore the edge uncertain- 
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Figure 2: Different types of subgraph features for 
uncertain graph classification 

ties. However, if such uncertainties are considered, we 
will find that <? 2 can rarely be observed within the uncer- 
tain graph dataset, and thus will not be useful in graph 
classification. Accordingly, g 3 is the best subgraph fea- 
ture for uncertain graph classification. 
Efficiency & Robustness: There are two additional 
problems that need to be considered when evaluating 
features for uncertain graphs: 1) In an uncertain graph 
dataset, there are an exponentially large number of 
possible instantiations of a graph dataset [35]. How 
can we efficiently compute the discrimination score of 
a subgraph feature without enumerating all possible 
implied datasets? 2) When evaluating the subgraph 
features, we should choose a statistical measure for 
the probablity distribution of discrimination scores 
which is robust to extreme values. For example, given 
a subgraph feature with (score, probability) pairs as 
(0.01,99.99%) and (+00, 0.01%), the expected score 
of the subgraph is +00, although this value is only 
associated with a very tiny probability. 

In order to address the above problems, we pro- 
pose a general framework for mining discriminative sub- 
graph features in uncertain graph datasets, which is 
called Dug (Discriminative feature selection for Uncer- 
tain Graph classification). The Dug framework can ef- 
fectively find a set of discriminative subgraph features 
by considering the relationship between uncertain graph 
structures and labels based upon various statistical mea- 
sures. We propose an efficient method to calculate the 
probability distribution of the scoring function based on 
dynamic programming. Then a branch-and-bound algo- 
rithm is proposed to search for the discriminative sub- 
graphs efficiently by pruning the subgraph search space. 
Empirical studies on resting-state fMRI images of dif- 
ferent brain diseases (i.e., Alzheimer's Disease, ADHD 
and HIV) demonstrate that the proposed method can 
obtain better accuracy on uncertain graph classification 



tasks than alternative approaches. 

For the rest of the paper, we first introduce prelim- 
inaries in Section [2j Then we introduce our Dug sub- 
graph mining framework in Section [3] Discrimination 
score functions based upon different statistic measures 
are discussed in Section [3Tl An efficient algorithm for 
computing the score distribution based upon dynamic 
programming is proposed in Section 13.21 Experimen- 
tal results are discussed in Section @] In Section [Bl we 
conclude the paper. 

2 Problem Formulation 

In this section, we formally define the model of uncertain 
graphs and the problem of discriminative subgraph 
mining in uncertain graph datasets. Suppose we are 
given an uncertain graph dataset T> = {Gi,--- , G„} 
that consists of n uncertain graphs. Gi is the z-th 
uncertain graph in T>. y = [yi, ■ ■ ■ ,y n ] T denotes the 
vector of class labels, where yi G {+1,-1} is the 
class label of Gi. We also denote the subset of V 
that contains only positive/negative graphs as T> + = 
{Gi\Gi eVf\yi = +1} and P_ = {G,|Gj G V f\ Vi = 
—1} respectively. 

Definition 1. (Certain Graph) A certain graph is 
an undirected and deterministic graph represented as 
G = (V,E). V — {«!,■■• ,v n „} is the set of vertices. 
E GV x V is the set of deterministic edges. 

Definition 2. (Uncertain Graph) An uncertain 
graph is an undirected and nondeterministic graph 
represented as G = (V, E,p). V = {vi, • • • , u n „} is the 
set of vertices. E C V X V is the set of nondetermin- 
istic edges, p : E (Q, 1] is a function that assigns a 
probability of existence to each edge in E. p{e) denotes 
the existence probability of edge e G E. 

Consider an uncertain graph G(V,E,p) G T>, where 
each edge e G E is associated with a probability 
p(e) of being present. As in previous works [UJ 
we assume that the uncertainty variables of different 
edges in an uncertain graph are independent from each 
other, though most of our results are still applicable 
to graphs with edge correlations. We further assume 
that all uncertain graphs in a dataset T> share a same 
set of nodes V and each node in V has a unique node 
label, which is reasonable in many applications like 
neuroimaging, since each human brain consists of the 
same number of regions. The main difference between 
different uncertain graphs is on their linkage structures, 
i.e., the edge sets E(G) and the edge probabilities pie). 

Possible instantiations of an uncertain graph G are 
usually referred to as worlds of G, where each world 



corresponds to an implied certain graph G. Here G is 
implied from uncertain graph G (denoted as G => G), 
iff all edges in E(G) are sampled from E(G) according 
to their probabilities of existence in p(e) and E(G) C 
E(G). There are 2\ E ( G ^ possible worlds for uncertain 
graph G, denoted as VV(G) = {G|G G}. Thus, 
each uncertain graph^G corresponds to a probability 
distribution over W(G). We denote the probability of 
each certain graph G G W(G) being implied by the 
uncertain graph G as Pr(G =>■ G), and we have 



Pr 



G^-G 



n pr GW n (! 

e£E(G) eeE{G)-E{G) 



Prg(e) 



Similarly, possible instantiations of an uncertain 
graph dataset V = {£?!>■" ,G„} are referred to as 
worlds of 2?, where each world corresponds to an 
implied certain graph dataset T> = {Gi,-- - ,G„}. A 
certain graph dataset T> is called as being implied from 
uncertain graph dataset T> (denoted as T> T>), iff 
\V\ = \V\ and Vi G {1, • • ■ , \V\), Gi G,. There 
are ni=i 2l' B ( Gi ) possible worlds for uncertain graph 
dataset V, denoted as W(V) = {V \ V => V}. An 
uncertain graph dataset T> corresponds to a probability 
distribution over W(X>). We denote the probability of 
each certain graph dataset T> G W{V) being implied by 
V as Pr(2? =>■ T>). By assuming that different uncertain 
graphs are independent from each other, we have 



Pr 



V => V \ = 



\T>\ 



The concept of subgraph is defined based upon 
certain graphs. Different from conventional subgraph 
mining problems where each subgraph feature can have 
multiple embeddings within one graph object, in our 
data model, each subgraph feature g can only have one 
unique embedding within a certain graph G. 

Definition 3. (Subgraph) Let g = (V',E') andG = 
(V, E) be two certain graphs, g is a subgraph of G 
(denoted as g C G) iff V C V and E' C E. We use 
g C G to denote that graph g is a subgraph of G. We 
also say that G contains subgraph g. 

For an uncertain graph G, the probability of G contain- 
ing a subgraph feature g is defined as follows: 

Pr( 5 CG)= Pr (G => G) • I(g C G) 

Gew(G) 

= [n ee£(g) P(e) itE(g)CE(G) 
I otherwise 



Table 1: Important Notations. 



Symbol Definition 



T> — {Gi, ■ ■ ■ , G n } uncertain graph dataset, Gi denotes the z-th uncertain graph in the datasct. 

y — [yi, ■ ■ ■ ,yn] T class label vector for graphs in T>, yi £ { + 1- — 1}- 

Z>+ and T>- the subset of Z> with only positive/negative graphs, T>+ — {Gi\Gi £ 7), yi — +1}- 

n+ and n_ number of positive graphs and number of negative graphs in 2?, n+ — \T)+ \ and n_ — |£>_ 

"D — {Gi, ■ ■ ■ , G n } a certain graph datasct implied from Z>, Gi denotes the certain graph implied from Gi. 

g C G graph G contains subgraph feature p 



/ l anu it, uiiiuuti ui f^ia,^jiia in i-'-J- / i-' uiicii Luiikauio auugiajiu y, (t -f- 

G G and 2? Z> certain graph G is implied from uncertain graph G; T> is implied from 7?. 

W(G) and W(5) the possible worlds of G and W(G) = {G|G G}, W(V) = X>}. 

E(Gi) and E(Gt) the set of edges in Gi and Gi 

■D-(-(/c) and T>-(k) the first graphs in Z>+ or Z>_ 



d number of graphs in T)+ / T) _ that contains subgraph 3, — |{Gi|p C Gi, Gi G 



which corresponds to the probability that a certain 
graph G implied by G contains subgraph g. 

We focus on mining a set of discriminative subgraph 
features to define the feature space of graph classifi- 
cation. It is assumed that a graph object Gi is rep- 
resented as a feature vector = [x\ : --- , a;™ 1 ] 1 " asso- 
ciated with a set of subgraph features {31, ■ • • ,3m}- 
Here, x\ = Pr(gk Q Gi) is the probability that Gi 
contains the subgraph feature gk ■ Now suppose _the 
full set of subgraph features in the graph dataset T> is 
S = {<?i, ■ • • ,<7m}, which we use to predict the class la- 
bels of the graph objects. The full feature set S is very 
large. Only a subset of the subgraph features (T C S) is 
relevant to the graph classification task, which is the tar- 
get feature set we want to find within uncertain graphs. 

The key issues of discriminative subgraph mining 
for uncertain graphs can be described as follows: 
(PI) How can one properly evaluate the discrimination 
scores of a subgraph feature considering the uncertainty 
of the graph structures? 

(P2) How can one efficiently compute the probability 
distribution of a subgraph's discrimination score by 
avoiding the exhaustive enumeration of all possible 
worlds of the uncertain graph dataset? Moreover, 
since the subgraph enumeration is NP-hard, it is also 
infeasible to fully enumerate all the subgraph features 
for an uncertain graph dataset. 

In the following sections, we will introduce the pro- 
posed framework for mining discriminative subgraphs 
from uncertain graphs. 

3 The Proposed Framework 

3.1 Discrimination Score Distribution In this 
subsection, we address the problem (PI) discussed in 
the previous section. In conventional discriminative 
subgraph mining, the discrimination scores of subgraph 
features are usually defined for certain graph datasets, 
e.g., information gain and G-test score [22]. The score 
of a subgraph feature is a fixed value indicating the dis- 



criminative power of the subgraph feature for the graph 
classification task. However, such concepts don't make 
sense to uncertain graph datasets, since an uncertain 
graph only contains a subgraph feature in a probabilis- 
tic sense. Now we extend the concept of discriminative 
subgraph features in uncertain graph datasets. Suppose 
we have an objective function F(g, T>) which measures 
the discrimination score of a subgraph g in a certain 
graph dataset T>. The corresponding objective func- 
tion on an uncertain graph dataset T> can be written as 
F(g, T>) accordingly. Note that F(g, T>) is no longer a 
deterministic function. F(g, T>) corresponds to a ran- 
dom variable over all possible outcomes of F(g, T>) (i.e., 
Range(i 7 ')) with probability distribution: 





s 2 


Pv[F(g,V) = Sl ] 


P4F(g,V) = s 2 ] ■■■ 



where Si G Range(-F). 

The probability distribution of the discrimination 
score values can be defined as follows: 



Pr 



F(g,V) = s 



D}-I(F(g,V) = s) 



where G {0, 1} is an indicator function, and 

I(tt) = 1 iff 7r holds. In other words, Vs G Range(F), 
Pr[F(g,2?) = s] is the summation over the probabilities 
of all worlds of T> in which the discrimination score 
F(g,T>) is exactly s. Based on the discrimination score 
function on uncertain graphs, we define four statistical 
measures that evaluate the properties of the distribution 
of F(g,T>) from different perspectives. 



Definition 4. (Mean-Score) Given an uncertain 
graph dataset T>, a subgraph feature g and a discrim- 
ination score function F(-,-), we define the expected 
discrimination score Exp(F(g,T))) as the mean score 



among all possible worlds ofT>: 

Exp(f( 9 ,V)) = J2PT[V=^D]-F(g,V) 

+00 

= J> Pr[F( 5 ,P) = S ] 

s— — 00 

The mean discrimination score is the expectation 
of the random variable F(g,V). The expectation is 
usually used in conventional frequent pattern mining on 
uncertain datasets [23 [26]. However, it's worth noting 
that the expectation of discrimination scores may not be 
robust to extreme values. In discriminative subgraph 
mining, the value of a score function (e.g., frequency 
ratio[10], G-test score[22]) can be +00. Such cases can 
easily dominate the computation of expectation, even 
if the probabilities are extremely small. For example, 
suppose we have a subgraph feature with the (score, 
probability) pairs as (0.01,99.99%) and (+00, 0.01%). 
The expected score will be +00. In order to address 
this problem, we either need to bound the maximum 
value of the objective function like mm(F(g,T>), ^), or 
we need to introduce other statistical measures which 
are robust to extreme values. 

Definition 5. (Median-Score) Given an uncertain 

graph dataset T>, a subgraph feature g and a discrimina- 
tion score function F(-, •) on certain graphs, we define 

the median discrimination score Median(.F(<7, T>)) as 

the median score among all possible worlds ofD: 



£ Pr[F(g,D) = . 



= 1 



The median score is relatively more robust to extreme 
values than expectation, although in some cases the 
median score can still be infinite. The same results can 
also hold for any quantile or fc-th order statistic. 

Another commonly used statistic is the mode score, 
i.e., the score value that has the largest probability. The 
mode score of a distribution means that the score is most 
likely to be observed within all possible worlds of T>. 

Definition 6. (Mode-Score) Given an uncertain 
graph dataset T>, a subgraph feature g and a discrim- 
ination score function F(-,-), we define the mode dis- 
crimination score MODE(F(g,V)) as the score that is 
most likely among all possible worlds of D: 

Mode(f(#,2>)J = argmax Pr F(g,V) = s 

Next we consider the probability of a subgraph fea- 
ture being observed as a discriminative pattern within 
all possible worlds of V, i.e., Yr[F(g,V) > ip]. It is 



Tabic 2: Summary of Discrimination Score Functions. 

Name f(r&_,Ti 9 _,n+,7i—) 

n 9 

confidence — <j + <j 
frequency ratio 
G-test 

HSIC(linear) 



+ 2(r 



9 N 1 "-■("+ - 



called (^-probability. The higher the value, the more 
likely that the subgraph feature is a discriminative pat- 
tern with a score larger or equals to a threshold if. 

Definition 7. (^-Probability) Given an uncertain 
graph dataset T>, a subgraph feature g and a discrimi- 
nation score function F(-, ■), we define the ip -probability 
for discrimination score function F(g,T>) as the sum of 
probabilities for all possible worlds ofT>, where the score 
is greater than or equals to if: 

<p-Pr(F(g, V)) = Pr P °] ■ 1 ( F 0?> V ) > <P) 

vew(v) 
+00 

= J2Pi[F(g,V) = s] 

s—ip 

The (/^-probability is robust to extreme values 
of the objective function. For the previous exam- 
ple, we have a subgraph feature with score distribu- 
tion: (0.01, 99.99%), (+00, 0.01%). The ^-probability 
is 0.01%, when ip = 1. 

We have already introduced four statistical mea- 
sures of the distribution of a discrimination score func- 
tion. Now the central problem for calculating all these 
measures is how to calculate Pr[F(g, V) = s] efficiently, 
which we will discuss in the following section. 

3.2 Efficient Computation In this subsection, we 
address the problem (P2) discussed in Section [2] Given 
a certain graph dataset T>, we denote the subsets of all 
positive graphs and all negative graphs as 2?+ and 2?_, 
respectively. Suppose the supports of subgraph feature 



g in T> + and £>_ are n 3 + and n g _. n g + = \{G; G £ £>+, g C 
G} I . Most of the existing discrimination score functions 
can be written as a function of n 9 + , n g _, n + and n_: 



(3.1) 



F{g, V) = f(n 9 + , n 9 _,n+, u_) 



The definition in Eq. 13.11 covers many discrim- 
ination score functions including confidence fre- 
quency ratio[in], information gain, G-test score[22] and 
HSIC[l3]. as shown in Table [2] For example, frequency 
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Figure 3: The dynamic programming process for com- 
puting Pr n a + , T> + . The same process applies for 

Pr\n 9 _,V 



ratio can be written as i(g) = | log 



— 77 



The G-test 



score can be written as G-test(g) = 2r&_ ■ In -4- V 

w / -t- n_-n + 

2 (n + — n 9 ,) ■ In — ,^" + — ^4. Because n+ and n_ arc 
fixed numbers for different subgraph features, we sim- 
ply use f{n 9 + , n 9 _) for f(n 9 +, tt.£, n + , n_) . 

Based on the above definitions, wc find that the 
number of possible outcomes of F(g, V) is bounded by 
7i + xn_, because < n 9 + < n + andO < n 9 _ < n_. Thus, 

the probabilities Pi[F(g,T>) — s] can be exactly com- 
puted via dynamic programming in 0(n 2 ) time, with- 
out enumerating all possible worlds of T>. Instead, we 
can just enumerate all possible combinations of (n 3 , , n 9 _ ) 
and calculate the probability for each pair (n 9 ,,n_), 



,V] = Pr 



F(g,V) = f(n 9 + ,n 9 _) 



denoted as Pr[n;j_, n£. 

Then the values of F(g, V) in all possible worlds with 
non-zero probabilities can be covered by the n + x n_ 
cases. 

Moreover, because different uncertain graphs are 
independent from each other, we have 

(3.2) Pi[n 9 + ,n 9 _,V} = P-c[n 9 + ,V + ] . Pr[n£,X>_] 



where Pr[rr±_, T>+] denotes the probability of the cases 
when there are n 9 + graphs in T>+ that contain the 
subgraph g. Pi[n 9 _,T>^\ corresponds to the cases when 
there are n 9 _ graphs in 2?_ that contain subgraph 
g. Now we just need to compute the probabilities 
Pr[n + ,£>+] (Vn + ,0 < n\ < n+) and Pr[n£,£>_] 
(\/n 9 __ , < n 9 _ <n) separately. 

Let T>(k) denote the first k uncertain graphs in 
V, i.e., V(k) = {Gi,---,G fc }. V+{k) and 



denote the first k graphs in T> + and X>_ respectively. 
All the values of Pr[n 9 i _,T> + ] and Pr[n£,2?_] can be 
calculated using the recursive equation in Figure [5l The 
Pr[i,X>(fc)] denotes the probability when there are i 
graphs containing g in T>(k). And the target values 
to calculate are Pr[i,2?+(n+)] (Vi,0 < i < n + ) and 
Pr[i, 2?_(n_)] (Vi,0 < i < nJ) by substituting the T> + 
and £>_ into the Eq. 13. 3[ respectively. In Figure IU we 
showed the dynamic programing algorithm to compute 
the target values using Eq. 13.31 Figure |3] illustrates 
the computation process of the dynamic programing 
algorithm for Pr[n£, £>+], while the same process also 
applies for Pr[n£.,2?_]. 

For details of the recursive equations in Figure[5J we 
have the base cases, Pr[0,2?o] = 1 and Pr[i,V(k)] = 
(if i > k or i < 0). For other cases, the probability 
value can be calculated through the recursive equation 
in Eq. 13.31 Then, Pi[n 9 + ,n 9 _,'D] can be calculated via 
Eq. 13.21 Thus all the statistical measures mentioned 
in Section lOl can be calculated within 0(n 2 ) time as 
follows: 



EXP (F(g,V)) = 



n— 



n 9 =0 n 9 =0 



Pr[n , , n 9 _, £>)] • f(n 9 ,, n 9 _) 



Median 25)) = argmaxi ^2 Pr[n 9 + ,n 9 _,V)] <-[ 

[x = -oo /( „9 _„3 ) = x 2 J 



Mode (F(g, 2?)) = argmax ^ Vr[n g + , n 3 _ , V)]-I(f(n 9 + , n 9 _ ) = s) 



n_i_ n_ 



We will show later that the dynamic programming 
process is highly efficient in all the applications studied 
in Section 21 For dataset with even larger number of 
graphs, the divid- and- conquer method in |19j could also 
be used here to further optimize the computational cost. 

3.3 Upper-Bounds for Subgraph Pruning In 

order to avoid the exhaustive enumeration of sub- 
graph features, we derive some subgraph pruning 
methods. One natural pruning bound for subgraph 
search is the expected frequency of a subgraph feature, 

Exp-Freq(g, V) = ^' Pr ^ 9 - g '- > ; since it's can be easily 
proved with anti-monotonic property. For the expec- 
tation and (^-probability, we can also derive additional 
bounds for subgraph pruning. Let F{g,T>) = /(n£,n£) 
be the estimated upper-bound function for g and its su- 
pergraphs in certain graph dataset T>. We can derive 



■ Pr[g C G fc ] - Pr[i, 77(fc - 1)] + Pr[g C Gf(fc)] ■ Pr[i - 1, V{k - 1)] 



(3.3) 



Pr U,X>(fe) = 



if i < k 

if % = k = o 

if i > k or 2 < 



Figure 5: Recursive equation for dynamic programming. 



Input: 

T>: the uncertain graph datasot {Gi, ■ ■ ■ , G„} t: the maximum number of subgraphs, 

y: the vector of class labels for uncertain graphs, min_sup: the minimum expected frequency. 

M: the statistic measure (Expcctation/Mcdian/Modc/ip-Pr) 

Recursive Subgraphs Mining: 

- Depth-first search the gSpan's code tree and update the feature list as follows: 

1. Update the candidate feature list using the current subgraph feature g c : 

Calculate the probability vector Pr[n^. c , ] and Pr[n£_ c ,Z5_ ] using the dynamic programing algorithm in Figure HI 
Compute the statistic measure M \F(^g G) r D)\ based on the discrimination score function F(g C) T>). 

If the score is larger than the worst feature in T, replace it and update 9 — min^T M (F(g, 22)J 

2. Test pruning criteria for the sub-tree rooted from node g as follows: 

if Exp-Frcq((/ C ) < rnin_sup, prune the sub-tree of g c 
if Bound-M (F(g c , P)J < G, prune the sub-tree of g c 

3. Recursion: Depth-first search the sub-tree rooted from node g c 
Output: 

7~: the discriminative subgraph features for uncertain graph classification. 



Figure 6: The DUG framework for discriminative subgraph mining. 



Input: 

■D_j_: the set of positive graphs 
T>—t the set of negative graphs 

Dynamic Programming: 

for n 9 ^ <— to n_j- 
for k <— 7i a + to n+ 

compute Pr[ n^_,'Dj r (k) ] via Eg. [331 
for n g _ <— to n— 
for k <— n 9 _ to n_ 

compute Pr[ n"_,T>-(k) ] via Eg. [3~3l 
Output: 

Pr[ n 3_,V + ] (Vnfj., < n 9 + < n+) 
Pr[n 3 _,V- ] (V n 9 _, 0<n 9 _ < n_) 



Figure 4: The dynamic programming algorithm for 
probability computation. 

the corresponding upper-bounds as follows: 

UB-Exp( 3 , Z>) = VJ ^ Pr[n 9 + ,n 9 _,V)] ■ f(n 9 + ,n 9 _) 

n 9 + =0 n g _=0 
n_j_ n_ 

UB-^-Pr( 3 ,©)= J2 J2 Pr K- n -M •'(/«> ) > V) 

For the median and mode measures, it is difhcult to 
derive a meaningful bound, thus we simply use the 
expected frequency to perform the subgraph pruning. 

We now utilize the above bounds to prune the 
DFS-code tree in gSpan [53] by the branch-and-bound 
pruning. The top-t best features are maintained in 



a candidate list. During the subgraph mining, we 
calculate the upper-bound of each subgraph feature 
in the search tree. If a subgraph feature with its 
children pattern cannot update the candidate feature 
list, we can prune the subtree of gSpan rooted from this 
node. It is guaranteed by the upper-bounds that we 
will not miss any better subgraph features. Thus, the 
subgraph mining process can be speeded up without loss 
of performance. The algorithm of Dug is summarized 
in Figure ID 

4 Experiments 

In order to evaluate the performance of the proposed 
approach for uncertain graph classification, we tested 
our algorithm on real-world fMRI brain images as 
summarized in Table [3] 

4.1 Data Collection In order to evaluate the per- 
formance of the proposed approach for uncertain graph 
classification, we tested our algorithm on real-world 
fMRI brain images. 

• Alzheimer's Disease (ADNI): The first dataset is col- 
lected from the Alzheimer's Disease Neuroimaging Ini- 
tiativ43- The dataset consists of records of patients 
with Alzheimer's Disease (AD) and Mild Cognitive Im- 
pairment (MCI). We downloaded all records of resting- 
state fMRI images and treated the normal brains as 

i http: / /adni. loni.ucla.edu/ 



negative graphs, and AD+MCI as the positive graphs. 
We applyed Automated Anatomical Labeling (AAL0) 
to extract a sequence of responds from each of of the 
116 anatomical volumes of interest (AVOI), where each 
AVOI represents a different brain region. The correla- 
tions of brain activities among different brain regions 
are computed. Positive correlations are used as uncer- 
tain links among brain regions. For details, we used 
SPM8 toolbosl, and functional images were realigned 
to the first volume, slice timing corrected, and normal- 
ized to the MNI template and spatially smoothed with 
an 8-mm Gaussian kernel. Resting-State fMRI Data 
Analysis Toolkit (RES 10) was then used to remove the 
linear trend of time series and temporally band-pass fil- 
tering (0.01-0.08 Hz). Before the correlation analysis, 
several sources of spurious variance were then removed 
from the data through linear regression: (i) six param- 
eters obtained by rigid body correction of head motion, 
(ii) the whole-brain signal averaged over a fixed region 
in atlas space, (iii) signal from a ventricular region of 
interest, and (iv) signal from a region centered in the 
white matter. Each brain is represented as an uncer- 
tain graph with 90 nodes corresponding to 90 cerebral 
regions, excluding 26 cerebellar regions. 

• Attention Deficit Hyperactivity Disorder (ADHD): 
The second dataset is collected from ADHD-200 global 
competition dataset 0. The dataset contains records of 
resting-state fMRI images for 776 subjects, which are 
labeled as real patients (positive) and normal controls 
(negative). Similar to the ADNI dataset, the brain 
images are preprocessed using Athena Pipeline^. The 
original dataset is unbalanced, we randomly sampled 
100 ADHD patients and 100 normal controls from the 
dataset for performance evaluation. 

• Human Immunodeficiency Virus Infection (HIV): The 
third dataset is collected from the Chicago Early HIV 
Infection Study in Northwestern University [21] . The 
dataset contains fMRI brain images of patients with 
early HIV infection (positive) as well as normal controls 
(negative). The same preprocessing steps as in ADNI 
dataset were used to extract a functional connectivity 
network from each image. 

4.2 Comparative Methods We compared our 
method using different statistical measures and discrim- 
ination score functions summarized as follows: 

• Frequent Subgraphs + Expectation (Exp+Freq): The 
first baseline method is finding frequent subgraph fea- 



Table 3: Summary of experimental datasets. 





\V\ 


\T>+\ 


\V-\ 


\v\ 


avg. \E\ 


avg. edge prob 


ADHD 


200 


100 


100 


116 


484.7 


0.55 


ADNI 


36 


18 


18 


90 


2019.8 


0.59 


HIV 


50 


25 


25 


90 


480.48 


0.88 



tures within uncertain graphs. This baseline is similar 
to the method introduced in [27] • In our data model, 
this baseline method computes the exact expected fre- 
quency of each subgraph features, instead of approx- 
imated values. The top ranked frequent patterns are 
extracted as used as features for graph classification. 

• Dug with HSIC based discrimination scores: we com- 
pare with four different versions of our Dug method 
based upon HSIC criterion, which maximize the de- 
pendence between subgraph features and graph labels 
[T3] . "Exp-HSIC" computes the expected HSIC value 
for each subgraph feature, and find the top-fc subgraphs 
with the largest values. "Med-HSIC" computes the me- 
dian HSIC value for each subgraph feature, while "Mod- 
HSIC" computes the mode HSIC value. >Pr-HSIC" 
computes the ^-probability of HSIC value for each sub- 
graph feature. 

• DUG with Frequency Ratio based discrimination 
scores: we also compare our method based upon Fre- 
quency Ratio, i.e., "Exp-Ratio", "Mod-Ratio", "Mod- 
Ratio" and 'VPr- Ratio". 

• Dug with G-test based discrimination scores: we then 
compare our method based upon G-test criterion, i.e., 
"Exp-Gtest", "Med-Gtest", "Mod-Gtest" and >Pr- 
Gtest". 

• Dug with Confidence based discrimination scores: the 
5th group of methods are based upon G-test criterion, 
i.e., "Exp-Conf , "Med-Conf , "Mod-Conf and >Pr- 
Conf ' . 

• Simple Thresholding: Another group methods we 
have compared are the feature selection methods for 
certain graphs. In order to get the certain graphs 
from the uncertain graphs in the dataset, we perform 
simple trcsholding over the weights of the links to 
get the binary links. These baseline methods include: 
"Freq", " HIS '<?', " Ratio" , "Gtest" and "Con/', which 
correspond to the discrimination scores used in previous 
5 groups separately. 

LibSVM [3] with the linear kernel is used as the base 
classifier for all compared methods. The minsup in the 
gSpan for ADHD, ADNI and HIV datasets are 20%, 



40% and 4 0% respectively. Since the range of different 

jhttp: //www. cyceron. f r/web/aal..anatomical.automatic.l ab^^I il fiipp t; ^ tiriT1 f unct i ns Can be extremely different. We 
3 http://www. fil.ion.ucl.ac.uk/spm/software/spm8/ . ,1 j <■ i, c T ioTn -j. r~\ j. 

4 , * I, c . , ' set the default ip lor HSIC criterion, G-test score, 

http://rcstmg-lmri.sourceIorge.net r 1 ' 

5http://neurobureau.projects.nitrc.org/ADHD200/ frequency ratio and confidence as 0.03, 200, 1 and 0.5, 

6 http : //www . nitre . org/plugins/mwiki/index . php/neurobureauKOapieflaMfi^line 



Table 4: Results on the ADNI (Alzheimer's Disease) dataset with different number of fcatures(i = 100, • • • , 500). 
The results are reported as "average performance + (rank)" . 

Error Rate J, Fl T Avg. 



Methods 



t = 100 



t = 200 



t = 300 



t = 400 



t = 500 



t = 100 



t = 200 



t = 300 



t = 400 



t = 500 



Exp-HSIC 
Med-HSrC 
Mod-HSIC 
^Pr-HSIC 

HSIC 

Exp- Ratio 
Mod-Ratio 
Mod-Ratio 

<^Pr-Ratio 

Ratio 

Exp-Gtcst 
Mod-Gtest 
Mod-Gtest 

t^Pr-Gtest 

Gtest 



0.400 (9) 
0.433 (14) 
0.367 (6) 
0.283 (1)* 
0.450 (16) 



0.367 (8) 
0.350 (5) 
0.333 (3) 
0.283 (1)* 
0.467 (19) 



0.367 (10) 
0.333 (6) 
0.300 (1)* 
0.333 (6) 
0.467 (17) 



0.317 (4) 
0.350 (8) 
0.317 (4) 
0.333 (7) 
0.500 (18) 



0.333 (9) 
0.317 (7) 
0.300 (2) 
0.300 (2) 
0.500 (18) 



0.699 (9) 
0.667 (13) 
0.703 (8) 
0.778 (1)* 
0.615 (18) 



0.725 (9) 
0.741 (7) 
0.750 (4) 
0.785 (1)* 
0.597 (19) 



0.725 (9) 
0.757 (4) 
0.776 (3) 
0.757 (4) 
0.622 (17) 



0.753 (6) 
0.734 (9) 
0.766 (3) 
0.750 (7) 
0.583 (18) 



0.743 (10) 
0.766 (7) 
0.775 (4) 
0.776 (3) 
0.584 (18) 



0.433 (14) 
0.450 (16) 
0.317 (3) 
0.400 (9) 
0.500 (19) 



0.383 (10) 
0.417 (15) 
0.350 (5) 
0.317 (2) 
0.483 (20) 



0.317 (4) 
0.450 (16) 
0.433 (15) 
0.300 (1)* 
0.533 (22) 



0.300 (2) 
0.383 (11) 
0.417 (13) 
0.300 (2) 
0.567 (22) 



0.300 (2) 
0.383 (11) 
0.467 (15) 
0.267 (1)* 
0.533 (20) 



0.667 (13) 
0.639 (17) 
0.776 (2) 
0.692 (10) 
0.581 (20) 



0.715 (10) 
0.653 (16) 
0.744 (6) 
0.764 (2) 
0.603 (18) 



0.756 (6) 
0.608 (20) 
0.659 (13) 
0.784 (1)* 
0.533 (21) 



0.766 (3) 
0.689 (12) 
0.657 (13) 
0.778 (2) 
0.519 (22) 



0.300 (2) 
0.517 (21) 
0.517 (21) 
0.450 (16) 
0.500 (19) 



0.367 (8) 
0.450 (18) 
0.550 (22) 
0.417 (15) 
0.500 (21) 



0.317 (4) 
0.400 (11) 
0.500 (21) 
0.417 (13) 
0.467 (17) 



0.350 (8) 
0.500 (18) 
0.500 (18) 
0.383 (11) 
0.433 (14) 



0.383 (11) 
0.483 (17) 
0.517 (19) 
0.300 (2) 
0.550 (21) 



0.774 (3) 
0.562 (21) 
0.531 (22) 
0.648 (16) 
0.583 (19) 



0.693 (11) 
0.597 (19) 
0.491 (22) 
0.675 (14) 
0.580 (21) 



0.729 (9) 
0.655 (14) 
0.527 (22) 
0.665 (12) 
0.612 (19) 



0.702 (10) 
0.567 (19) 
0.545 (20) 
0.701 (11) 
0.656 (14) 



(8.3) 
(8.0) 
(3.8) 
(3.3) 
(18.1) 

(7.1) 
(14.5) 
(9.9) 
(3.1) 

(20.4) 

0.672 (12) (7.8) 

0.589 (17) (17.5) 

0.558 (19) (20.6) 

0.768 (6) (116) 

0.547 (21) (18.6) 



0.766 (7) 
0.684 (11) 
0.612 (15) 
0.809 (1)* 
0.550 (20) 



Exp-Conf 


0.367 (7) 


0.333 (3) 


0.300 (1)* 


0.283 (1)* 


0.300 (2) 


0.744 (6) 


0.762 (3) 


0.780 (2) 


0.795 (1)* 


0.780 (2) 


*(2.8) 


Mcd-Conf 


0.333 (4) 


0.350 (5) 


0.350 (8) 


0.350 (8) 


0.317 (7) 


0.760 (4) 


0.747 (5) 


0.752 (7) 


0.740 (8) 


0.770 (5) 


(6.1) 


Mod-Conf 


0.417 (12) 


0.383 (10) 


0.350 (8) 


0.317 (4) 


0.333 (9) 


0.690 (11) 


0.728 (8) 


0.742 (8) 


0.759 (5) 


0.750 (9) 


(8.4) 


<?Pr-Conf 


0.400 (9) 


0.417 (15) 


0.467 (17) 


0.467 (16) 


0.433 (13) 


0.685 (12) 


0.648 (17) 


0.619 (18) 


0.592 (17) 


0.632 (13) 


(14.7) 


Conf 


0.400 (9) 


0.400 (13) 


0.417 (13) 


0.450 (15) 


0.467 (15) 


0.655 (15) 


0.667 (15) 


0.645 (15) 


0.618 (15) 


0.610 (16) 


(14.1) 


Exp-Froq 


0.383 (8) 


0.383 (10) 


0.400 (11) 


0.467 (16) 


0.433 (13) 


0.705 (7) 


0.685 (13) 


0.675 (11) 


0.607 (16) 


0.632 (13) 


(11.8) 




0.350 (5) 


0.400 (13) 


0.483 (20) 


0.550 (21) 


0.550 (21) 


0.747 (5) 


0.692 (12) 


0.627 (16) 


0.539 (21) 


0.547 (21) 


(15.5) 



Table 5: Results on the ADHD (Attention Deficit Hyperactivity Disorder) dataset with different number of 
features (t — 100, ■ • • , 500). The results are reported as "average performance + (rank)". 

Error Rate I Fl t Av 



Methods 


t = 100 


t = 200 


t = 300 


t = 400 


t = 500 


t = 100 


t = 200 


t = 300 


t = 400 


t = 500 




Exp. 


■ HSIC 


0.423 (10) 


0.438 (13) 


0.455 (14) 


0.455 (11) 


0.448(12) 


0.593 (10) 


0.564 (13) 


0.543 (14) 


0.547 (11) 


0.549 (12) 


(12.0) 


Med- 


-HSIC 


0.420 (9) 


0.405 (8) 


0.413 (8) 


0.448 (10) 


0.433 (6) 


0.569 (13) 


0.597 (7) 


0.593 (5) 


0.549 (10) 


0.562 (7) 


(8.3) 


Mod- 


-HSIC 


0.390 (4) 


0.405 (8) 


0.403 (4) 


0.393 (1)* 


0.410 (2) 


0.614 (3) 


0.599 (6) 


0.596 (4) 


0.594 (1)* 


0.584 (2) 


*(3.5) 


vPr- 


■HSIC 


0.432 (12) 


0.470 (17) 


0.475 (16) 


0.513 (22) 


0.503 (21) 


0.597 (7) 


0.563 (14) 


0.554 (13) 


0.508 (17) 


0.525 (18) 


(15.7) 




HSIC 


0.529 (22) 


0.510 (20) 


0.488 (17) 


0.455 (11) 


0.485 (17) 


0.505 (22) 


0.494(18) 


0.498 (18) 


0.538 (13) 


0.526 (17) 


(17.5) 


Exp- Ratio 


0.388 (3) 


0.400 (5) 


0.415 (10) 


0.440 (8) 


0.420 (4) 


0.613 (4) 


0.604 (5) 


0.587 (9) 


0.556 (8) 


0.576 (4) 


(6.0) 


Med- 


Ratio 


0.450 (16) 


0.418 (11) 


0.388 (1)* 


0.428 (6) 


0.410 (2) 


0.554 (15) 


0.586 (12) 


0.619 (1)* 


0.571 (5) 


0.579 (3) 


(7.2) 


Mod- 


Ratio 


0.400 (7) 


0.370 (1)* 


0.408 (5) 


0.435 (7) 


0.428 (5) 


0.595 (8) 


0.634 (1)* 


0.591 (7) 


0.558 (7) 


0.560 (9) 


(5.7) 


V=Pr- 


Ratio 


0.372 (1)* 


0.430 (12) 


0.410 (7) 


0.415 (2) 


0.408 (1)* 


0.630 (1)* 


0.589 (9) 


0.590 (8) 


0.591 (2) 


0.589 (1)* 


(4.4) 




Ratio 


0.515 (20) 


0.520 (21) 


0.490 (18) 


0.475 (17) 


0.498 (19) 


0.550 (16) 


0.461 (22) 


0.461 (21) 


0.503 (19) 


0.517 (20) 


(19.3) 



Exp-Gtest 
Mod-Gtest 
Mod-Gtest 

t^Pr-Gtest 

Gtest 

Exp-Conf 
Mcd-Conf 
Mod-Conf 

;pPr-Conf 

Con! 

Exp-Froq 



0.393 (6) 
0.437 (13) 
0.448 (15) 
0.450 (16) 
0.440 (14) 



0.403 (7) 
0.400 (5) 
0.383 (4) 
0.445 (14) 
0.505 (19) 



0.413 (8) 
0.408 (5) 
0.398 (2) 
0.443(12) 
0.501 (21) 



0.420 (3) 
0.420 (3) 
0.428 (5) 
0.455 (11) 
0.486 (19) 



0.435 (9) 
0.453 (15) 
0.433 (6) 
0.433 (6) 
0.471 (16) 



0.610 (5) 
0.559 (14) 
0.571 (12) 
0.544 (19) 
0.542 (20) 



0.588 (10) 
0.590 (8) 
0.622 (4) 
0.555 (16) 
0.492 (19) 



0.582 (10) 

0.600 (2) 

0.593 (5) 

0.552 (12) 

0.490 (20) 



0.586 (3) 
0.580 (4) 
0.565 (6) 
0.538 (13) 
0.499 (21) 



0.563 (5) 
0.551 (10) 
0.551 (10) 
0.562 (7) 
0.534 (16) 



0.405 (8) 
0.378 (2) 
0.392 (5) 
0.468 (19) 
0.455 (18) 



0.415 (10) 
0.373 (2) 
0.373 (2) 
0.460 (15) 
0.500 (18) 



0.453 (13) 
0.438 (11) 
0.400 (3) 
0.495 (20) 
0.460 (15) 



0.455 (11) 
0.463 (15) 
0.440 (8) 
0.505 (21) 
0.464 (16) 



0.448 (12) 
0.435 (9) 
0.435 (9) 
0.485 (17) 
0.450 (14) 



0.595 (8) 
0.629 (2) 
0.606 (6) 
0.547 (18) 
0.514 (21) 



0.587 (11) 
0.632 (2) 
0.627 (3) 
0.556 (15) 
0.479 (20) 



0.539 (15) 

0.555 (11) 

0.600 (2) 

0.519 (16) 

0.510 (17) 



0.543 (12) 
0.536 (15) 
0.556 (8) 
0.507 (18) 
0.498 (22) 



0.535 (15) 
0.545 (13) 
0.563 (5) 
0.540 (14) 
0.519 (19) 



0.423 (10) 
0.515 (20) 



0.465 (16) 
0.520 (21) 



0.508 (22) 
0.490 (18) 



0.498 (20) 
0.475 (17) 



0.505 (22) 
0.498 (19) 



0.579 (11) 
0.550 (16) 



0.549 (17) 
0.461 (21) 



0.496 (19) 
0.461 (21) 



0.513 (16) 
0.503 (19) 



0.498 (22) 
0.517 (20) 



(6.6) 
(7.9) 
(6.9) 
(12.6) 
(18.5) 

(11.5) 
(8.2) 
(5.1) 
(17.3) 
(18.0) 

(17.5) 
(19.2) 



4.3 Performance on Uncertain Graph Classifi- 
cation In our experiments, we first randomly sample 
80% of the uncertain graphs as the training set, and the 
remaining graphs as the test set. This random sampling 
experiment was repeated 20 times. The average perfor- 
mances with the rank of each method are reported. The 
reason for using classification performances to evaluate 
the quality of subgraph features is that classification 
methods can usually achieve higher accuracy with fea- 
tures of better discriminative powers. We measure the 
classification performance by error rate and Fl score. 

Table [5] and Table [4] show the evaluation results in 
terms of classification error rates and Fl scores with 
different number of selected subgraph features (t = 
100, • • • , 500). The results of each method are shown 



with average performance values and their ranks among 
all the other methods. Values with * stand for the best 
performance for the corresponding evaluation criterion. 
It is worth noting that the neuroimaging tasks are gen- 
erally very hard to predict very accurately. According 
to a global competition on ADHD datase10, the aver- 
age performance of all winning teams is about 8% over 
the prediction accuracy of chance (i.e., randomly as- 
signing diagnoses). Thus in neuroimaging tasks, it is 
very hard for classification algorithms to achieve even 
moderate error rates. And in ADHD dataset, the best 
performance that Dug can achieve is with error rate 



'http: / /www. childmind.org/en/posts/press-releases/2011-10- 
12-johns-hopkins-team-wins-adhd-200-competition 



Table 6: Results on the HIV (Human Immunodeficiency Virus) datasct with different number of features 
(t = 100, • • • , 500). The results are reported as "average performance + (rank)". 









Error Fi <\ t c [. 










Fl 1~ 






Avg. 


Methods 


t = 100 


t = 200 


t = 300 


t = 400 


t = 500 


t = 100 


t = 200 


t = 300 


t = 400 


t = 500 


Rank 


Exp-HSIC 


0.480 (15) 


0.470 (10) 


0.489 (12) 


0.505 (16) 


0.498 (13) 


0.526 (13) 


0.531 (8) 


0.517 (11) 


0.491 (14) 


0.492 (13) 


(12.5) 


Med-HSIC 


0.498 (17) 


0.500 (18) 


0.470 (7) 


0.484 (11) 


0.507 (16) 


0.501 (18) 


0.493 (18) 


0.526 (8) 


0.510 (10) 


0.474 (16) 


(13.9) 


Mod-HSIC 


0.502 (18) 


0.489 (15) 


0.482 (11) 


0.498 (14) 


0.500 (14) 


0.501 (18) 


0.501 (16) 


0,495 (14) 


0.481 (17) 


0.467 (19) 


(15.6) 


tpPr-HSIC 


0.523 (19) 


0.511 (19) 


0.516 (18) 


0.525 (19) 


0.523 (20) 


0.484 (20) 


0.492 (19) 


0.481 (16) 


0.474 (19) 


0.482 (14) 


(18.3) 


HSIC 


0.464 (6) 


0.495 (17) 


0.566 (21) 


0.500 (15) 


0.505 (15) 


0.526 (13) 


0.460 (20) 


0.405 (21) 


0.489 (15) 


0.471 (18) 


(16.1) 


Exp- Ratio 


0.475 (13) 


0.477 (11) 


0.491 (13) 


0.516 (18) 


0.484 (8) 


0.541 (8) 


0.533 (7) 


0.509 (13) 


0.477 (18) 


0.519 (8) 


(11.3) 


Med- Ratio 


0.466 (8) 


0.464 (8) 


0.470 (7) 


0.457 (5) 


0.473 (6) 


0.541 (8) 


0.528 (9) 


0.524 (9) 


0.534 (6) 


0.521 (6) 


(7.2) 


Mod-Ratio 


0.450 (3) 


0.452 (5) 


0.466 (4) 


0.480 (9) 


0.484 (8) 


0.558 (5) 


0.547 (5) 


0.528 (6) 


0.509 (11) 


0.500 (12) 


(6.8) 


<^Pr~Ratio 


0.473 (11) 


0.480 (12) 


0.466 (4) 


0.470 (8) 


0.468 (5) 


0.544 (7) 


0.519 (13) 


0.538 (5) 


0.531 (7) 


0.538 (5) 


(7.7) 


Ratio 


0.530 (21) 


0.486 (13) 


0.589 (22) 


0.411 (1)* 


0.520 (19) 


0.456 (21) 


0.495 (17) 


0.376 (22) 


0.562 (4) 


0.443 (20) 


(16) 


Exp-Gtest 


0.468 (9) 


0.466 (9) 


0.468 (6) 


0.466 (7) 


0.482 (7) 


0.562 (4) 


0.565 (4) 


0.548 (4) 


0.537 (5) 


0.520 (7) 


(6.2) 


Med-Gtest 


0.464 (6) 


0.461 (7) 


0.507 (17) 


0.507 (17) 


0.511 (17) 


0.534 (11) 


0.520 (11) 


0.480 (17) 


0.483 (16) 


0.474 (16) 


(10.9) 


Mod-Gtest 


0.477 (14) 


0.486 (13) 


0.475 (10) 


0.491 (13) 


0.489 (11) 


0.529 (12) 


0.507 (14) 


0.523 (10) 


0.497 (13) 


0.501 (11) 


(12.1) 


(^Pr-Gtest 


0.430 (1)* 


0.420 (2) 


0.425 (1)* 


0.418 (2) 


0.425 (2) 


0.617 (1)* 


0.633 (1)* 


0.630 (1)* 


0.637 (1)* 


0.633 (1)* 


*(1.3) 


Gtest 


0.473 (11) 


0.550 (21) 


0.493 (14) 


0.534 (20) 


0.493 (12) 


0.514 (16) 


0.426 (22) 


0.491 (15) 


0.509 (11) 


0.477 (15) 


(15.7) 


Exp-Conf 


0.457 (4) 


0.430 (4) 


0.441 (2) 


0.443 (4) 


0.441 (3) 


0.576 (3) 


0.590 (2) 


0.572 (2) 


0.570 (3) 


0.573 (4) 


(3.1) 


Mcd-Conf 


0.445 (2) 


0.427 (3) 


0.441 (2) 


0.441 (3) 


0.443 (4) 


0.579 (2) 


0.588 (3) 


0.572 (2) 


0.579 (2) 


0.574 (3) 


(2.6) 


Mod-Conf 


0.457 (4) 


0.455 (6) 


0.473 (9) 


0.482 (10) 


0.484 (8) 


0.556 (6) 


0.545 (6) 


0.527 (7) 


0.518 (9) 


0.508 (9) 


(7.4) 


^Pr-Gonf 


0.534 (22) 


0.552 (22) 


0.545 (19) 


0.548 (21) 


0.541 (22) 


0.454 (22) 


0.443 (21) 


0.444 (20) 


0.443 (22) 


0.438 (21) 


(21.2) 


Conf 


0.468 (9) 


0.416 (1)* 


0.502 (15) 


0.489 (12) 


0.339 (1)* 


0.515 (15) 


0.528 (9) 


0.468 (19) 


0.462 (20) 


0.621 (2) 


(10.3) 


Exp-Frcq 


0.525 (20) 


0.520 (20) 


0.548 (20) 


0.550 (22) 


0.527 (21) 


0.503 (17) 


0.520 (11) 


0.473 (18) 


0.457 (21) 


0.423 (22) 


(19.2) 


Frcq 


0.489 (16) 


0.489 (15) 


0.502 (15) 


0.461 (6) 


0.514 (18) 


0.535 (10) 


0.505 (15) 


0.517 (11) 


0.520 (8) 


0.502 (10) 


(12.4) 



37%, which is 13% improvement over the prediction er- 
ror rate of chance. 

We find that our discriminative subgraph mining 
method with different settings outperforms the baseline 
method (Exp-Freq) for frequent subgraph mining, which 
selects subgraph features based upon expected frequen- 
cies in the uncertain graph datasct. This is because that 
frequent subgraph features in uncertain graph dataset 
may not be relevant to the classification task. 

Moreover, we can see that almost all the Dug 
methods outperform the simple thresholding methods 
which directly convert the uncertain graphs into certain 
graphs and then use different discimination functions 
to select subgraph features. This is because that simply 
converting uncertain graphs into certain graph can loss 
the uncertainty information about the linkage structures 
of the graphs, thus the classification performances on 
certain graphs are not as good as the performance of 
uncertain graphs. 

A third observation is that the performance of each 
method on different dataset can be quite different. How- 
ever, the best methods that consistently outperforms 
other methods in all datasets are Med-Conf and ip-Pr- 
Ratio. They both have their advantages in different 
perspectives. Med-Conf method has one less parame- 
ter than that of (/?-Pr-Ratio. (^-Pr-Ratio method has an 
additional subgraph pruning bound compared to Med- 
Conf method, which can be important for datasets with 
even larger graphs. 



{0.01, 0.02, •• -0.06} separately. We can see that the 
method is not sensitive to the parameter ip. Gener- 
ally, the performance of i^-Pr-HSIC with default setting 
(ip = 0.03) is pretty good. If we try to optimize the 
selection of <p value, the accuracy can be even better. 

We also compare Dug models with and without 
pruning in the subgraph search space as summarized 



in Figure 7(c) The CPU time with different minsup 
for Exp-HSIC in ADNI dataset is reported. Dug can 
improve the efficiencies by pruning the subgraph search 
space. In other datasets Dug shows similar trends. 
Figure [8] shows the running time for mod-HISC with 
different number of graphs in the dataset. In addition 
to the dynamic programming method we used in Dug, 
we also find that the brute-force searching method 
that enumerates all possible worlds of the uncertain 
graph dataset cannot work on small datasets with even 
40 graphs. The running time of Dug scales almost 
linearly with the number of graphs in the dataset. 
Althought the dynamic programming process of Dug 
is 0(n 2 ), which is quadratic to the number of graphs 
in the dataset. However, in the ADHD dataset, the 
main computational cost of Dug algorithm is for the 
subgraph enumeration step, which is linear to the 
number of the graphs in the dataset. In cases of 
even larger datasets, the dynamic programming process 
could eventually dominant the computational cost. In 
these cases, the divide-and-conquer method in [19] could 
be used to further optimize the computational cost. 



4.4 Influence of Parameter In the (yS-Pr based 
methods, there is an additional threshold parameter 
than 



the other methods. In Figure 7(a) and Fig- 
ure |7(b)| we tested the t^-Pr-HSIC with ip values among 



5 Related Work 

Our work is related to subgraph mining techniques for 
both certain graphs and uncertain graphs. We briefly 
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Figure 7: Parameter Studies (ADNI dataset). 



discuss both of them. 

Mining subgraph features in graph data has been 
studied intensively in recent years [15] . Most of the 
previous research has been focused on certain graph 
datasets, where the edges of the graph objects arc 
binary/certain. The aim of these subgraph mining 
method is to extract important subgraph features based 
on the structure of the graphs. Depending on whether 
the class labels are considered in the feature mining 
steps, existing methods can roughly be categorized into 
two types: frequent subgraph mining and discriminative 
subgraph mining. In frequent subgraph mining, Yan 
and Han proposed a depth-first search algorithm, gSpan 
[23], which maps each graph to a unique minimum 
DFS code and use right-most extension technique for 
subgraph extension. There are also many other frequent 
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Figure 8: Running time on ADNI dataset. 



subgraph mining methods that have been proposed, 
e.g., AGM [5], FSG [15], MoFa [JJ, and Gaston [16], etc. 
Discriminative subgraph mining have also been studied 
intensively in the literature, such as LEAP [32] and LTS 
[TO] , where the task is to find discriminative subgraph 
for graph classifications. 

Recently, there has been a growing interest in ex- 
ploiting data uncertainty, especially structural uncer- 
tainty in graph data. There are some recent works on 
mining frequent subgraph features for uncertain graphs 
[2TJ [26] [28] [17]. The problem of mining frequent sub- 
graph in uncertain graphs are more difficult to those of 
certain graphs. The authors [27] proposed a method 
to estimate approximately the expected support of a 
subgraph feature in uncertain graph datasets. In |26| . 
the authors studies the (^-probabilities for frequent sub- 
graph features within uncertain graph datasets. The 
difference between these works and our paper are as 
follows: 1) In this paper, we study how to find discrimi- 
native subgraph features for uncertain graph data. The 
class labels of the graph objects are considered during 
the subgraph mining. 2) The graph model in our paper 
is different from previous uncertain graph data, since 
we assume different graph object shares the same set 
of nodes as inspired by the neuroimaging applications. 
Thus, our method compute the exact discrimination 
scores of each subgraph features, instead of approximate 
scores. There are also many other works on uncertain 
graphs, which focus on different problems, e.g., reliable 
subgraph mining [f2] , fc-nearest neighbor discovery [18] , 
subgraph retrieval [H] etc. 

Our work is also motivated by the recent advances 
in analyzing neuroimaging data using data mining and 
machine learning approaches [6j [8] [25]. Huang 
et. al. [6j developed a sparse inverse covariance 
estimation method for analyzing brain connectivities in 
PET images of patients with Alzheimer's disease. 



6 Conclusion 

In this paper, we studied the problem of discriminative 
subgraph feature selection for uncertain graph classifi- 
cation. We proposed a general framework, called DUG, 
for finding discriminative subgraph feature in uncertain 
graphs based upon various statistical measures. The 
probability distributions of the scoring function are ef- 
ficiently computed based on dynamic programming. 
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